Test Set Bounds for Relational Data that vary with Strength of Dependence Test Set Bounds for Relational Data that Vary with Strength of Dependence
نویسندگان
چکیده
A large portion of the data that is collected in various application domains such as online social networking, finance, biomedicine, etc. is relational in nature. A subfield of Machine Learning namely; Statistical Relational Learning (SRL) is concerned with performing statistical inference on relational data. A defining property of relational data that separates it from independently and identically distributed data (i.i.d.) is the existence of correlations between individual datapoints. A major portion of the theory developed in machine learning assumes the data is i.i.d. In this paper we develop theory for the relational setting. In particular, we derive distribution-free bounds on the generalization error of a classifier for the relational setting, where the class of data generation models we consider are inspired from the type joint distributions that are represented by relational classification models developed by the SRL community. A key aspect of the bound we derive is that the tightness of the bound is a function of the strength of dependence between related datapoints, with the bound reducing to the standard Hoeffding’s or McDiarmid’s inequality when there is no dependence. To the best of our knowledge this is the first bound for relational data whose tightness varies with the strength of dependence. Moreover, the bound provides insight in the computation of effective sample size which is an important notion introduced by Jensen and Neville (2002).
منابع مشابه
Test Set Bounds for Relational Data that vary with Strength of Dependence
A large portion of the data that is collected in various application domains such as online social networking, finance, biomedicine, etc. is relational in nature. A subfield of Machine Learning namely; Statistical Relational Learning (SRL) is concerned with performing statistical inference on relational data. A defining property of relational data that separates it from independently and identi...
متن کاملAuto-correlation Dependent Bounds for Relational Data
A large portion of the data that is collected in various application domains such as online social networking, finance, biomedicine, etc. is relational in nature. A subfield of Machine Learning namely; Statistical Relational Learning (SRL) is concerned with performing statistical inference on relational data. A defining property of relational data that separates it from independently and identi...
متن کاملEfficiency Evaluation and Ranking DMUs in the Presence of Interval Data with Stochastic Bounds
On account of the existence of uncertainty, DEA occasionally faces the situation of imprecise data, especially when a set of DMUs include missing data, ordinal data, interval data, stochastic data, or fuzzy data. Therefore, how to evaluate the efficiency of a set of DMUs in interval environments is a problem worth studying. In this paper, we discussed the new method for evaluation and ranking i...
متن کاملMining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملPrediction of Pervious Concrete Permeability and Compressive Strength Using Artificial Neural Networks
Pervious concrete is a concrete mixture prepared from cement, aggregates, water, little or no fines, and in some cases admixtures. The hydrological property of pervious concrete is the primary reason for its reappearance in construction. Much research has been conducted on plain concrete, but little attention has been paid to porous concrete, particularly to the analytical prediction modeling o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009